Some Size Optimization
Introduction
This article describes a small (and hopefully useful) little utility which most coders need from time to time, the 'raw binary file-2-asm source' thing. Some ppl might find it useful for another reason, it's only 72 bytes in length. It demonstrates some widely known size optimization tricks to achieve this size. No doubt it could be even smaller.... (I'm betting that a certain person with an '.NL' email address will "chew some bytes".)
Anyway, let's get on with the code!!
Hey, your comments suck!
Yeah, but not as much as my code (heheh)..
Next to some instructions in the comments field I've written a [n] number, these parts will be more fully explained later. Heres the entire source!
.MODEL TINY ; [1]
.CODE
ORG 256
go: mov ax, 823Dh ; [2]
mul si
mov bl, [si-80h] ; [3]
mov [si+bx-7Fh], al ; [4] mark end of filename
lp: int 21h ; [5] open the file/print string
jc bye+1
bye: mov dx, 01C3h ; [6] load at [DS:01C3] hex
mov ah, 3Fh
mov cl, 8 ; [7]
mov bl, 5 ; [8] (BX)=file handle
int 21h ; load upto 8 bytes
xchg ax, cx
jcxz bye+1 ; [9] read 0 bytes (EOF) ?
lea di, txt+5
mov si, dx
prt: mov al, '0'
stosb
lodsb ; get byte to convert
aam 16 ; [10] AH=hi nibble, AL=lo nibble
hex: xchg ah, al
cmp al, 10
sbb al, 69h
das ; [11] convert(AL) into ASCII
stosb
xor ch, cl
jnz hex ; [12] toggle z/nz (loop 2x)
mov ax, ',h'
stosw ; add ',h'
loop prt
mov ax, 0924h ; [13] (AL)=char '$', (AH)=func 9
dec di
stosb ; [14] mark end with '$'
mov dl,(txt-go) ; [15]
jmp lp
txt db 13,10,'db',9 ; new-line string
END go
Those pesky numbers
Okay, the comments were optimized a little too much in the above... Let me explain the reasons for doing some of those strange things.
.MODEL TINY ; [1]
.CODE
ORG 256
First part of any optimized program: Compile it as a .COM program. It avoids all the .EXE header, the relocations and the register-setup of a normal .EXE program. The register values when a .COM program starts are:
CS = DS = ES = SS ; = code segment (16 bit, 64Kb)
AX=0000 BX=0000 CX=00FF DX=CS
SI=0100 DI=FFFE BP=09xx SP=FFFE
IP=0100
DF=0 IF=1
Anyone on the Hugi size coding compo mailing list will instantly recognize the above... pity that some debuggers like TD don't..
go: mov ax, 823Dh ; [2]
mul si
The strange looking instruction pair above makes (AX)=3D00h and (DX)=0082h but only takes 5 bytes instead of the usual 6 which two "MOV reg16,xxxx" instructions take up. The following instructions do exactly the same thing in the same number of bytes, but the MUL method makes (AL)=00 for free.
go: mov dx, 0082h ; [2]
mov ah, 3Dh
The following marks the end of the filename on the command-line with a 00 byte terminator, without it DOS will fail to open the file.
mov bl, [si-80h] ; [3]
mov [si+bx-7Fh], al ; [4] mark end of filename
It's the same as this, but 2 bytes smaller..
mov bl, [0080h] ; [3]
mov [bx+0081h], al ; [4] mark end of filename
I've used (SI) as a base address because it saves 1 byte in each of the above lines of code. An 8-bit displacement takes 1 byte, whereas a 16-bit address obviously takes 2 bytes. The (SI) register has a default value of 0100 hex (256 decimal).
lp: int 21h ; [5] open the file/print string
The above INT 21h instruction is used twice, once to open the file and once at the end to print the string using function AH=09h.
bye: mov dx, 01C3h ; [6] load at [DS:01C3] hex
The above address 01C3 hex has a special purpose, the low-byte (C3 hex) is used as a hidden RET instruction. This is a technique called 'Overlapping Opcodes' or 'Hidden Opcodes'. The idea is simple, take a multi-byte instruction and then jump into the middle of it. The opcode bytes for the "MOV DX,01C3h" are BA C3 01 hex, so by jumping past the 1st byte we hit C3 hex which is a "RET" instruction.
The high-byte (01 hex) has a special purpose too. The 'txt' string shares the same high-byte address, so this saves another byte over a normal "LEA DX,txt" or "MOV DX,xxxx" instruction.
mov cl, 8 ; [7]
Assume that (CH)=00, this saves 1 byte over a "MOV CX,0008" instruction.
mov bl, 5 ; [8]
This is a very naughty method. It assumes the file handle for the first opened file is 5 (which under certain conditions it ain't!!). I should have really saved the file-handle given back by the (AH)=3D open function and used this, but hey, this saved a few bytes on the INT 21h.
xchg ax, cx
jcxz bye+1 ; [9] read 0 bytes (EOF) ?
If he hit the EOF (End-Of-File) the (CX) register will be 0, otherwise it will have a number from 1 to 8 denoting the number of bytes read in.
aam 16 ; [10] AH=hi nibble, AL=lo nibble
The very useful BCD instructions!!! (Ah, my favourite.) The above splits the byte in the (AL) register into two parts. The high-nibble (bits 7..4) is placed in (AH) and the low-nibble (bits 3..0) is placed in (AL). Which is very useful for hex-2-ascii routines.
AH = AL div 16 AL = AL mod 16
Check out st0ne's nice article about the other BCD instructions in Hugi 17, well worth the read.
xor ch, cl
jnz hex ; [12] toggle z/nz (loop 2x)
Because (CX) is always between 1 and 8 the (CH) register can be used as nice loop counter with a repeat of two (once for each nibble) by toggling it with the value in the (CL) register. You can of course use NOT, XOR or even CMC to perform a similar task depending on the loop itself.
The XOR instruction also clears the CF (carry-flag) which is important because the INT 21h is used to open the file and to print the string.
mov ax, 0924h ; [13] (AL)=char '$', (AH)=func 9
The above loads two 8-bits registers at the same time. This saves 1 byte over a "MOV AH,09h" and "MOV AL,24h" combination.
dec di
stosb ; [14] mark end with '$'
Mark the end of the current line buffer with '$' (hex 24) as needed by the stoopid Int 21h, Function 9 call. The above saves 1 byte over a "MOV [DI-1],AL" instruction.
mov dl,(txt-go) ; [15]
The (DH) register is already 01 hex, so only the low-byte of the address needs to be loaded here. This saves 1 byte over a normal "MOV DX,xxxx".
How to use it
Almost forgot. First compile it as a .COM program.
TASM raw2
TLINK /t raw2
Then give it a filename on the command-line and redirect its screen output to any file you wish. I normally use the '.DB' extension, coz I'm too thick to think up anything more imaginative, also because using .ASM and .INC as a file-extension is far too dangerous... e.g.:
RAW2 mydata.dat > mydata.db
Final thought.
Oh well, that's all folks... I know it's not a ground-breaking article, but someone might find it useful. There has been so few articles about size optimization (or 'size coding' as Adok likes to call it) in the past, that I thought one was needed.
You should find the 'full' source code in the bonus pack together with a safer version which does not assume the file-handle is 5.
Happy optimizing...